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1  .   INTRODUCTION 

The  purpose  of  this  paper  is  to  present  several  FORTRAN 
subroutines  for  updating  the  QR  decomposition  of  a  matrix.   Let 
A  €  Rmxn,  m  >  n,  have  a  QR  decomposition  A  =  QR ,  where  Q  g  (Rmxn  has 
orthonormal  columns,  and  R  G  IR     is  upper  triangular.   Assume  that 
the  elements  of  Q  and  R  are  explicitly  known.   Let  A  £  R  x  ,  p  >  q, 
be  obtained  from  A  by  inserting  or  deleting  a  row  or  a  column,  or  let 
A  be  a  rank-one  modification  of  A,  i.e.,  A  =  A  +  vu  ,  where  u  £  Rn , 
v  G  Rm .   Then  a  QR-decompos it i on  of  A ,  A  =  QR,  where  Q  £    RPxQ    has 
orthonormal  columns  and  R  £R  x   is  upper  triangular,  can  be  computed 
in  0(mn)  arithmetic  operations  by  updating  Q  and  R;  see  Daniel  et  al . 
[5] .   The  updating  is  done  by  applying  Givens  reflectors.   The 
operation  count  for  updating  Q  and  R  compares  favorably  with  the 
0(mn  )  arithmetic  operations  necessary  to  compute  a  QR  decomposition 
of  a  general  mxn  matrix. 

Algol  procedures  for  computing  Q  and  R  from  Q  and  R  are  presented 
by  Daniel  et  al .   [5] .   Buckley  [2]  translated  these  procedures  into 
FORTRAN.   Our  FORTRAN  subroutines  implement  modifications  of  the 
Algol  procedures  in  [5] .   These  modifications  speed  up  the 
subroutines  and  make  them  suitable  for  use  on  vector  computers.   This 
is  illustrated  by  timing  experiments. 

Several  program  libraries,  such  as  LINPACK  [6]  and  NAG  [14]  , 
provide  subroutines  for  updating  R  only,  but  contain  no  routines  for 
updating  the  complete  QR  decomposition.   Advantages  of  updating  both 
Q  and  R  include  that  downdating  can  be  carried  out  stably,  and  that 
the  individual  elements  of  projections  are  easily  accessible;  see 
LINPACK  [6,  p.  10.23],  Daniel  et  al .   [5],  and  Stewart  [17]. 


The  first  comprehensive  survey  of  updating  algorithms  was 
presented  by  Gill  et  al.   [8] ,  and  a  recent  discussion  with  references 
to  applications  can  be  found  in  Golub  and  Van  Loan  [10,  Chapter 
12.6] .   The  applications  include  linear  least  squares  problems. 
regression  analysis,  and  the  solution  of  nonlinear  systems  of 
equations.   See  Allen  [1] ,  Goldfarb  [9] ,  Gragg  and  Stewart  [11] ,  More 
and  Sorensen  [13] .   The  algorithms  would  also  appear  to  be  applicable 
to  recursive  least  squares  problems  of  signal  processing:  see  Ling  et 
al  .  [12]  . 

We  also  present  a  subroutine  which  implements  the  rank  revealing 
QR  decomposition  method  recently  proposed  by  Chan  [3]  .    in  this 
method  the  QR  decomposition  A  =  QR  is  updated  to  yield  the  QR 
decomposition  A  =  QR.  where  A  is  obtained  from  A  by  column 
permutation.   This  permutation  is  selected  so  that,  in  general,  the 
element (s)  in  the  lower  right  corner  of  R  are  small  if  A  has  nearly 
linearly  dependent  columns.   The  subroutine  can  be  used  to  solve  the 
subset  selection  problem;  see  Golub  and  Van  Loan  [10] .   Table  1.1 
lists  the  FORTRAN  subroutines  for  updating  the  QR  decomposition.   All 
subroutines  use  double  precision  arithmetic  and  are  written  in 
FORTRAN  77.   Section  2  contains  programming  details  for  the 
subroutines  of  Table  1.1  and  for  certain  auxiliary  subprograms.   For 
all  subroutines  of  Table  1.1.  except  DRRPM ,  the  numerical  method  as 
well  as  Algol  procedures  have  been  presented  in  [5] .   For  these 
subroutines  we  will  only  discuss  differences  between  our  FORTRAN 
subroutines  and  the  Algol  procedures.   These  differences  stem  in  part 
from  the  algorithms  being  sped  up,  as  well  as  from  the  use  of  simple 
subroutines,  BLAS ,  for  elementary  vector  and  matrix  operations. 


Subroutine  Purpose 

DDELC  Computes  Q,R  from  Q,R  when  A  is  obtained  from  A  by 

deleting  a  column;  see  [5]  . 

DDELR  Computes  Q,R  from  Q,R  when  A  is  obtained  from  A  by 

deleting  a  row;  see  [5] . 

DINSC  Computes  Q,R  from  Q,R  when  A  is  obtained  from  A  by 

inserting  a  column;  see  [5]  . 

DINSR  Computes  Q,R  from  Q,R  when  A  is  obtained  from  A  by 

inserting  a  row;  see  [5]  . 

DRNK1  Computes  Q,R  from  Q,R  when  A  is  a  rank-one 

modification  of  A;  see  [5] . 

DRRPM  Computes  Q,R  from  Q,R  when  A  is  obtained  by 

permuting  the  columns  of  A  in  a  manner  that 
generally  reveals  if  columns  of  A  are  nearly 
linearly  dependent;  see  [3]  . 

Table  1.1:   Subroutines  for  updating  a  QR  decomposition  A  =  QR 
to  yield  a  QR  decomposition  A  =  QR. 


The  BLAS  are  discussed  in  Section  3.   They  have  been  written  to 
vectorize  efficiently  on  a  IBM  3090-200VF  computer  using  the 
vectorizing  compiler  VS  FORTRAN  2.3.0  without  special  compiler 
directives.   Most  BLAS  were  obtained  by  modifying  LINPACK  BLAS  [6] . 
We  hope  that  the  provided  BLAS  vectorize  well  without  excessive 
timing  increases  also  on  other  vector  computers.   Section  4  contains 
output  from  a  driver  illustrating  the  use  of  the  subroutines.   A 
listing  of  the  source  code  of  the  driver  is  provided  in  the  Appendix 
Section  4  also  contains  some  timing  results. 


2.   THE  UPDATING  SUBROUTINES 

We  consider  the  subroutines  of  Table  1.1   in  order.   These 
subroutines  use  auxiliary  subroutines  which  we  need  to  introduce 
first.   They  are  listed  in  Table  2.1. 


Auxi 1 iary 
subrout  i  ne 

DORTHO 


DORTHX 


DINVIT 


DTRLSL 


DTRUSL 


Called  by 
subrout  i  ne 

DINSC,  DRNK1 


DDELR 


DRRPM 


DINVIT 


DINVIT 


Purpose 

Compute  s:  =  Q  w,  v:  —     (I-QQ  )w 
with  reorthogonal ization  for 
arbitrary  vector  w. 

Compute  s:  =  Q  e:,  v:  =  (I-QQT)ej, 
with  reorthogonal i zat i on  foi 


r  ax  i  s 


vector  e: 


Compute  approximation  of  a  right 
singular  vector  corresponding  to  a 
least  singular  value  of  R.   A  first 
approximation  is  obtained  from  the 
LINPACK  condition  number  estimator 
DTRCO .  and  is  improved  by  inverse 
iterat i  on . 

Solve  lower  triangular  system  of 
equations  with  frequent  rescalings  in 
order  to  avoid  overflow.   Similar  to 
part  of  DTRCO. 

Solve  upper  triangular  linear  system 
of  equations  with  frequent  rescalings 
in  order  to  avoid  overflow.   Similar 
to  part  of  DTRCO . 


Table  2.1:   Auxil iary  subroutines 


2.1   Subroutines  DORTHO  and  DORTHX 


Given  a  matrix  Q  £ 


in 


>  n,  with  orthonormal  columns  and  a 


vector  w  £  Rm ,  the  subroutine  DORTHO  computes  the  Fourier 
coefficients  s:  =  Q  w  and  the  orthogonal  projection  of  w  into  the 
null -space  of  QT ,  v:  =  (I-QQT)w.   At  most  one  reorthogonal i zat i on  is 


carried  out.   Since  the  subroutine  DORTHO  differs  from  the 
corresponding  Algol  procedure  " orthogonal i ze"  in  [5]  we  discuss 
DORTHO  and  its  use  in  some  detail. 

Subroutine  DORTHO  is  called  by  routine  DINSC,  which  updates  the 
QR  factorization  of  a  matrix  A  =  QR  G  Rmxn  ,  m  >  n,  when  a  column  w  is 
inserted  into  A.   Updating  may  not  be  meaningful  if  w  is  nearly  a 
linear  combination  of  the  columns  of  Q.   Therefore  DORTHO  computes 
the  condition  number  of  the  matrix  Q:  =  [Q,w/||w||]  £  Rmx(n+1  ,  where  the 
norm      is  the  Euclidean  norm.   Using  Q  Q  =  I,  we  obtain  the 
following  expressions  for  the  singular  values  a^     >    cr2  >  ...  >  ^n+i  °~f 
Q: 

a,    =     (1    +    ||QTw||/||w||)l/2,  (2.1a) 

<Tj    =    1,  2    <    j     <    n,                                            (2.1b) 

<rn  +  1    =     (1     -     ||QTw||/!|w||)l/2.  (2.1c) 

Further,     for    v:     =     (  I -QQT  )  w/||w||  , 

IM|    =    o-i^n+i-  (2.2) 

Since  1  <  <7^  <  n2  ,  0"n  +  i  is  also  an  accurate  estimate  of  the  length  of 
the  orthogonal  projection  of  w/||w||  into  the  null-space  of  Q  .    In 
order  to  avoid  severe  cancellation  of  significant  digits  in  (2.1c)  we 
determine  first  a-^    from  (2.1a)  and  then  0"n+i  from  (2.2)  . 

Subroutines  DINSC  and  DORTHO  have  an  input  parameter  RCOND  which 
is  a  lower  bound  for  the  reciprocal  condition  number.   The 
computations  are  discontinued  and  an  error  flag  is  set  if 
RCOND  <  an+1/a1.       On  exit,  RCOND:  =  <rn+1/<r1 . 


Assume  now  that  the  input  value  of  RCOND  >  crn,l/(rl.       Then  hMRTHO 
computes  s:  =  Q  w  and  v:  =  (I-QQ  )w  by  a  scheme  analogous  to  the 
method  described  by  Parlett  [15,  p.  107]  for  orthogonal izing  a  vector 
against  another  vector.   For  def i n i teness ,  we  present  the 
orthogonal i zat i on  scheme.   References  to  a^  ,  an  ■  l ,  and  RCOND  are 
neglected  for  simplicity. 

Orthogonal izat ion  algorithm:    input  Q  G  Rmxn  (Q  has  orthonormal 
columns),  m , n  (m  >  n),  w  e  Rm  (w  ^  0);  output  v  (v  =  (I-QQT)w), 
s  (s  =  Q  w)  ; 

w  :  =  w/||w||  ; 

s:  =  QTw  ;  v:  =  w-Qs;  (2.3) 

H    ||v||     >     0.707    then 

v:     =    v/||v||  ;     s:     =     s||w||  ;     exit;     *     ||v||    =     1.     Q     v    =    0     * 
s':     =    QTv;     v' :     =    v-Qs';  (2.4) 

if    ||v'||     <    0.707||v||    then 

*  w  lies  in  span{Q}  numerically  * 

v:  =0;  s:  =  (s+s  )  ||w||  ;  set  flag;  exit; 
v:  =  (v+v')/||v+v'||  ;  s:  =  (s+s')||w||;  exit;  *  ||v||  =  1,  QTv  =  0  * 

The  proof  in  Parlett  [15,  pp.  107-108]  that  one  reorthogonal i zat ion 
suffices  carries  over  to  the  present  algorithm,  using  that  Q  Q  =  I. 
We  note  that  there  are  other  ways  to  carry  out  the  computations 
on  lines  (2. 3) -(2. 4).    In  [5],  v  and  v  are  updated  immediately  after 
a  component  of  s  is  computed.   Our  scheme  has  the  advantages  of  being 
faster  on  vector  computers,  since  it  allows  matrix  vector  operations, 
and  it  is  also,  generally,  more  accurate,  since  rounding  errors 


accumulate  less.   The  latter  can  easily  be  shown,  and  we  omit  the 

detai 1 s . 

We  turn  to  subroutine  DORTHX .   This  is  a  Taster  version  oT 

subroutine  DORTHO .   DORTHX  assumes  that  w  in  the  orthogonal i zat i on 

algorithm  is  an  axis  vector.   This  simplifies  the  computations  in 
(2.3).   DORTHX  may  perform  nearly  twice  as  fast  as  DORTHO. 

2.2   Subroutines  DINVIT,  DTRLSL  and  DTRUSL 

Given  a  nonsingular  upper  triangular  matrix  U  =  [Mjk]  €  Rnxn  and  a 
vector  b  =  [/?j]  G  Rn  ,  DTRUSL  solves  Ux  =  bp ,  where  \p\     <     1  is  a 
scaling  factor  such  that  |  (3p/p    \     <     1  for  all  j.   The  scaling  factor 
is  introduced  in  order  to  avoid  overflow  when  solving  very  ill- 
conditioned  linear  systems  of  equations.   DTRLSL  is  an  analogous 
subroutine  for  lower  triangular  systems. 

DTRLSL  and  DTRUSL  are  called  by  DINVIT,  a  subroutine  for 
computing  an  approximation  of  a  right  singular  vector  belonging  to  a 
least  singular  value  of  a  right  triangular  matrix  R.    If  R  is 
singular  then  such  a  singular  vector  is  computed  by  solving  a 
triangular  linear  system  of  equations.   Otherwise  an  initial 
approximate  right  singular  vector  a°  =  {a-  }?  ,  is  obtained  by  the 
LINPACK  condition  number  estimator  DTRCO ,  and  inverse  iteration  with 
R  R  is  used  to  obtain  improved  approximations  a   ,  j  =  1,2,... ,NMBIT, 
where  NMBIT  is  an  input  parameter  to  DINVIT  and  DRRPM .   On  exit  from 
DINVIT  and  DRRPM,  IPOS(j)  contains  the  least  index  k  such  that 
lak  I  >  \<*\ ;  I»  1<^<n>°<J<    NMBIT.   On  return  from  DINVIT  and 
DRRPM  the  parameter  DELTA  is  given  by  DELTA:  =  ||RTRa(NMBIT)||/||a(NMBIT)||  . 
Hence,  DELTA  is  an  upper  bound  for  the  least  singular  value  of  R. 
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2.3   Updating  subroutines 

We  are  in  a  position  to  consider  the  subroutines  of  Table  1.1. 
The  vector izat ion  is  mainly  done  in  the  BLAS  of  the  next  section,  but 
some  loops  of  the  subroutines  of  Table  1.1  vectorize  as  well. 
Comments  in  the  source  code  reveal  which  loops  vectorize  or  are 
eligible  for  vector i zat ion  on  an  IBM  3090-200VF  computer  with 
compiler  VS  FORTRAN  2.3.0  to  where  the  default  vector izat i on 
directives  are  used.   For  applications  to  particular  problem  classes, 
changing  the  default  vector i zat ion  by  compiler  directives  may 
decrease  the  execution  time. 

We  list  the  differences  between  the  subroutines  of  Table  1.1  and 
the  corresponding  Algol  procedures  of  [5] .   Some  of  these 
modifications  were  suggested  in  [5]  but  not  implemented  in  the  Algol 
procedures  [5] .    In  subroutine  DDELC .  the  column  deleted  in  A:  =  QR 
is  determined  optionally.   Not  computing  this  column  saves  0(mn) 
arithmetic  operations.    In  subroutine  DDELR ,  the  auxiliary  subroutine 
DORTHX  is  used  instead  of  D0RTH0 .   As  indicated  in  Section  2.1  the 
former  subroutine  may  perform  nearly  twice  as  fast.    In  subroutine 
DINSC,  a  column  w  is  inserted  into  A:  =  QR  only  if  the  reciprocal 
condition  number  of  the  matrix  [Q«w/||w||   is  larger  than  a  bound  given 
by  the  parameter  RCOND  on  entry.   The  parameter  RCOND  can  be  used  to 
prevent  updating  when  w/||w||  is  nearly  in  the  range  of  Q.    Finally, 
DRNK1  performs  slightly  faster  if  the  updated  matrix  A  +  vu    is  such 
that  v  lies  numerically  in  the  range  of  A. 

The  subroutine  DRRPM  implements  an  algorithm  presented  by  Chan 
[3].   The  computation  of  an  approximate  right  singular  vector 
corresponding  to  a  least  singular  value  is  done  by  subroutine  DINAH 


and  has  already  been  discussed.   The  position  of  a  component  of 
largest  magnitude  of  this  singular  vector  has  to  be  determined,  and 
we  found,  in  agreement  with  Chan's  suggestion  [3] ,  that  two  inverse 
iterations  suffice.    In  fact,  in  all  computed  examples,  one  inverse 
iteration  was  sufficient,  even  for  problems  with  multiple  or  close 
least  singular  values.   The  subroutine  permutes  the  order  of  columns 

I  through  k  of  All  where  k  is  an  input  parameter,  A  G  Rmxn  ,  m  >  n,  and 

II  is  a  permutation  matrix.   DRRPM  is  typically  called  with 

k  =  n,n-l,n-2,...  until  no  further  permutat  ion  is  made  or  unt  i 1  the 
computed  upper  bound  DELTA  for  the  least  singular  value  of  the  matrix 
consisting  of  the  first  k  columns  of  All  is  not  small  . 

The  subroutines  of  Table  1.1  do  neither  require  nor  produce  a 
factorization  with  nonnegative  diagonal  elements  of  the  upper 
triangular  matrix. 

3.   THE  BLAS 

Much  computational  experience  on  a  variety  of  computers  led 
Dongarra  and  Sorensen  [7]  to  conclude  that  nearly  optimal  performance 
of  numerical  linear  algebra  subroutines  can  be  achieved  if  the  sub- 
routines for  the  basic  matrix  and  vector  operations,  such  as  multi- 
plication, addition  and  inner  product  computation,  are  written  to 
perform  well  on  vector  computers.   We  wanted  to  write  a  code  that 
performs  well  on  an  IBM  3090-200VF  computer,  and  that  would  not 
require  excessive  tuning  when  moved  to  other  (vector)  computers. 
Therefore  we  designed  the  code  to  vectorize  well  without  special 
compiler  instructions,  since  the  latter  would  be  machine  dependent. 
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A  feature  of  the  VS  FORTRAN  2.3.0  compiler  is  that  unnecessary 

vector  loads  and  stores  are  avoided  by  introducing  a  temporary  scalar 

variable,  denoted  by  ACC  in  the  subroutine  DAPX  in  Example  3.1. 

During  execution  ACC  should  be  thought  of  as  a  vector  variable  stored 

in  a  vector  register.   Timings  for  DAPX  and  comparison  with  code  with 

explicitly  unrolled  loops  have  been  carried  out  by  Robert  and 

Squazzero  [16] .   These  timings  show  subroutine  DAPX  to  perform  better 
than  equivalent  subroutines  with  explicitly  unrolled  loops. 

Example  3.1.   Subroutine  for  matrix  vector  multiplication. 

SUBROUT I NE  DAPX ( A , LDA , M , N , X , Y) 
C 

C     DAPX  COMPUTES  Y:=A*X. 
C 

INTEGER  LDA, M.N. I .J 
REAL*8  A(LDA,N) ,X(N) ,Y(M) , ACC 
C 

C     OUTER  LOOP  VECTORIZES. 
C 

DO  10  1  =  1  ,M 
ACC=0D0 
DO  20  J=l ,N 

ACC=ACC+A ( I , J) *X ( J) 
20      CONTINUE 
Y(I )=ACC 
10    CONTINUE 
RETURN 
END  □ 

Temporary  scalar  variables  have  also  been  used  in  others  of  the  17 
BLAS  used . 


4.   COMPUTED  EXAMPLES 

Example  4.1.    In  this  example  the  QR  decomposition  of  a  4  x  3  matrix 
A  is  updated.   The  use  of  all  subroutines  of  Table  1.1  is 
illustrated.   The  main  program  producing  this  output  is  listed  in  the 
Append  ix . 
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Example  4.2.   Execution  times  for  subroutines  DDELCO  and  DRNK1  are 
compared  for  scalar  and  vector  arithmetic.   The  measured  cpu  times 
differed  somewhat  between  different  executions  of  the  same  code. 
Therefore  the  reported  times  are  rounded  to  one  significant  digit  and 
the  quotient  of  measured  cpu  times  are  rounded  to  the  nearest 
multiple  of  1/2. 

Table  4.1  shows  the  cpu  times  for  DDELCO.   This  routine  and  its 
subroutines  have  been  compiled  with  the  VS  FORTRAN  2.3.0  compiler. 
The  times  for  vector  arithmetic  are  obtained  from  code  generated  with 
compiler  option  vlev  =  2,  which  makes  the  compiler  generate  code  that 
utilizes  the  vector  registers  and  arithmetic.   The  times  for  scalar 
arithmetic  are  obtained  from  code  generated  with  compiler  option  vlev 
=  0,  which  makes  the  compiler  generate  code  that  does  not  use  vector 
instructions.   Given  a  QR  decomposition  of  a  matrix  A  £  Rmxn  .  Table 
4.1  shows  the  cpu  time  required  by  DDELCO  to  compute  the  QR 
decomposition  of  A  £  Rmx  n"   obtained  by  deleting  column  one  of  A. 

cpu  time  in  seconds 

scalar  arithmetic    vector  arithmetic        seal ar — 1 1 me 

vector    time 

4-10"4  4-1CT4  1 

4-10"4  4-10"4  1 

5-10"4  4-1 0"4  1.5 

6-10"4  4-10"4  1.5 

8-lCT4  4-10"4  2 

1-10"3  5-10"4  2 

7-10"3  2-10"3  3.5 

9-10"3  3-10"3  3.5 

Table    4.1:       Timings    for    DDELCO 


m 

n 

10 

10 

20 

10 

30 

10 

50 

10 

75 

10 

128 

10 

1024 

10 

1280 

10 

13 


m 

n 

16 

12 

32 

25 

64 

50 

128 

100 

1024 

100 

Table  4.2  is  similar  to  Table  4.1  and  contains  execution  times  for 
DRNK1 .   The  reduction  in  execution  time  obtained  by  using  vector 
instructions  is  of  the  same  order  of  magnitude  for  the  other  updating 
routines,  too. 

cpu  time  in  seconds 

scalar  arithmetic    vector  arithmetic      scalar — t lme 

vector  time 

1-iO"3  1-10"3  1 

4-10"3  3-10"3  1.5 

2-10"2  7-10"3  2 

6-10"2  2-10"2  2.5 

4-10"1  8-10"2  4.5 

1250    100          5-10"1  9-10"1  5 

Table  4.2:   Timings  for  DRNK1 


Example  4.3.   Execution  times  for  subroutines  written  by  Buckley  [2] 
and  those  of  Table  1.1  are  compared.   The  vectorized  and  scalar  codes 
were  generated  as  explained  in  Example  4.2.   We  found  that 
vector izat ion  of  the  subroutines  in  [2]  did  not  change  the  execution 
times  significantly,  generally  less  than  20%.   In  all  computed 
examples  the  vectorized  subroutines  in  [2]  required  at  least  twice  as 
much  execution  time  than  the  vectorized  subroutines  of  Table  1.1. 
For  certain  problems  our  vectorized  code  executed  up  to  95  times 
faster  than  the  vectorized  code  in  [2] .   For  scalar  code  the 
differences  in  execution  time  often  decreased  with  increasing  matrix 
size.   Tables  4.3-4.6  present  some  sample  timings. 
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m 

time  for 
DDELR 

t  i  me  Tor 
DELROW  [2] 

t 

ime  for  DDELR 

10 

3-10"5 

9-10"4 

3.5101 

64 

7-10"5 

2-10"3 

2-101 

128 

8-10"4 

3-10"3 

3 

1024 

4-10"3 

2-10"2 

4 

Table  4.3:   The  first  row  of  A  =  QR  is  deleted.   Cpu  times  for 
vectorized  code  for  updating  Q  and  R  are  given  in 

seconds;  n  =  10. 

time  for  DELCOL  [2] 
time  for  DDELC 

2 

7.5-101 
9.5-101 


time  for 

time  for 

m 

n 

DELC 

DELCOL  [2] 

1024 

10 

1-10"4 

2-10"4 

1024 

100 

110"4 

7-10"3 

1280 

100 

1-10"4 

910"3 

Table  4.4:   The  last  column  of  A  =  QR  is  deleted.   Cpu  times  for 
vectorized  code  for  updating  Q  and  R  are  given  in 
seconds.   DDELC  does  not  compute  the  last  column  of 

A,  i.e.,  IFLAG  =  0  on  entry. 

time  for  INSCOL  [2] 
time  for  DINSC 

2 
2 

2.5 


time  for 

time  for 

m 

DINSC 

INSCOL  [2] 

64 

1-10"3 

2-10"3 

128 

1-10"3 

3-10"3 

1024 

7-10"3 

2-10"2 

Table  4.5:   A  new  first  column  is  inserted  into  A  =  QR .   Cpu  times 
for  vectorized  code  for  updating  Q  and  R  are  given  in 

seconds;  n  =  10. 


Tables  4.3-4.5  present  timings  for  vectorized  code.   The  next  table 
shows  timings  for  scalar  code  for  the  same  updatings  as  in  Table  4.3 
Table  4.6  shows  that,  without  vector i zat i on ,  DELROW  [2]  requires  50% 
more  cpu  time  than  DDELR  for  moderately  large  problems. 
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m 

t  ime    for 
DDELR 

time    for 
DELROW     [2] 

t 

l  me 

for    DDELR 

10 

2-10"5 

6-10"4 

3-101 

64 

1-10"3 

2-10"3 

1  .5 

128 

2-10"3 

3-10"3 

1  .5 

024 

1-10"2 

210"2 

1  .5 

Table  4.6:   The  first  row  of  A  =    QR  is  deleted.   Cpu  times  for 

scalar  code  for  updating  Q  and  R  are  given  in  seconds; 
n  =  10. 
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